Anveshak - A Groundtruth Generation Tool for Foreground Regions of Document Images

نویسندگان

  • Soumyadeep Dey
  • Jayanta Mukherjee
  • Shamik Sural
  • Amit Vijay Nandedkar
چکیده

We propose a graphical user interface based groundtruth generation tool in this paper. Here, annotation of an input document image is done based on the foreground pixels. Foreground pixels are grouped together with user interaction to form labeling units. These units are then labeled by the user with the user defined labels. The output produced by the tool is an image with an XML file containing its metadata information. This annotated data can be further used in different applications of document image analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Automatic Generation of Character Groundtruth for Scanned Documents: A Closed-Loop Approach - Pattern Recognition, 1996., Proceedings of the 13th International Conference on

Character groundtruth for scanned document images as crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not possible because (a) accuracy an delineating groundtruth character bounding boxes is not high enough, (ii)...

متن کامل

Automatic generation of character groundtruth for scanned documents: a closed-loop approach

Character groundtruth for scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not possible because (i) accuracy in delineating groundtruth character bounding boxes is not high enough, (ii)...

متن کامل

An Automatic Closed-Loop Methodology for Generating Character Groundtruth for Scanned Documents

Character groundtruth for real, scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not practical because (i) accuracy in delineating groundtruth character bounding boxes is not high enoug...

متن کامل

Title of Thesis : GROUNDTRUTH GENERATION AND DOCUMENT IMAGE DEGRADATION

Title of Thesis: GROUNDTRUTH GENERATION AND DOCUMENT IMAGE DEGRADATION Gang Zi, Master of Science, 2005 Thesis Directed By: Professor Rama Chellappa Department of Electrical and Computer Engineering University of Maryland at College Park The problem of generating synthetic data for the training and evaluation of document analysis systems has been widely addressed in recent years. With the incre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016